# Real-time voice interaction

Voila Autonomous Preview
MIT
Voila is a large family of speech-language foundation models designed to enhance human-computer interaction, supporting real-time, low-latency voice interaction and multilingual processing.
Text-to-Audio Transformers Supports Multiple Languages
V
maitrix-org
332
8
Voila Audio Alpha
MIT
Voila is a large family of speech-language foundation models designed to enhance human-computer interaction, supporting real-time, low-latency voice interaction and multilingual processing.
Text-to-Audio Transformers Supports Multiple Languages
V
maitrix-org
175
3
Voila Chat
MIT
Voila is a brand-new large-scale speech-language foundation model series designed to elevate human-computer interaction to unprecedented levels.
Text-to-Audio Transformers Supports Multiple Languages
V
maitrix-org
2,423
32
Seallms Audio 7B
Other
SeaLLMs-Audio is a large-scale audio language model targeting Southeast Asia. It supports five major languages: Indonesian, Thai, Vietnamese, English, and Chinese, and has capabilities such as audio analysis and voice interaction.
Audio-to-Text Safetensors Supports Multiple Languages
S
SeaLLMs
539
10
Voila Tokenizer
MIT
Voila is a large-scale voice-language foundation model series designed to enhance human-computer interaction, supporting multiple audio tasks and languages.
Text-to-Audio Transformers Supports Multiple Languages
V
maitrix-org
4,912
3
Ast Finetuned Speech Commands V2
Bsd-3-clause
An audio spectrogram transformer model fine-tuned on the Speech Commands v2 dataset for audio classification tasks, achieving 98.12% accuracy.
Audio Classification Transformers
A
MIT
10.94k
15
Featured Recommended AI Models
AIbase
Empowering the Future, Your AI Solution Knowledge Base
© 2025AIbase